Development of a Machine Learning-Based Prediction Model for Chemotherapy-Induced Myelosuppression in Children with Wilms’ Tumor

Simple Summary Wilms’ tumor is the most common renal malignant tumor in children, and chemotherapy is an indispensable part of the treatment for most Wilms’ tumor patients. Chemotherapy-induced myelosuppression is the most common and serious toxicity of chemotherapy, which can hinder the process of chemotherapy and even endanger life. However, there is a lack of tools to predict chemotherapy-induced myelosuppression. We herein develop a model based on machine learning that can effectively predict the risk of chemotherapy-induced myelosuppression in children with Wilms’ tumor, offering the possibility to identify children with high risk of chemotherapy-induced myelosuppression early and take preventive strategies. Abstract Purpose: Develop and validate an accessible prediction model using machine learning (ML) to predict the risk of chemotherapy-induced myelosuppression (CIM) in children with Wilms’ tumor (WT) before chemotherapy is administered, enabling early preventive management. Methods: A total of 1433 chemotherapy cycles in 437 children with WT who received chemotherapy in our hospital from January 2009 to March 2022 were retrospectively analyzed. Demographic data, clinicopathological characteristics, hematology and blood biochemistry baseline results, and medication information were collected. Six ML algorithms were used to construct prediction models, and the predictive efficacy of these models was evaluated to select the best model to predict the risk of grade ≥ 2 CIM in children with WT. A series of methods, such as the area under the receiver operating characteristic curve (AUROC), the calibration curve, and the decision curve analysis (DCA) were used to test the model’s accuracy, discrimination, and clinical practicability. Results: Grade ≥ 2 CIM occurred in 58.5% (839/1433) of chemotherapy cycles. Based on the results of the training and validation cohorts, we finally identified that the extreme gradient boosting (XGB) model has the best predictive efficiency and stability, with an AUROC of up to 0.981 in the training set and up to 0.896 in the test set. In addition, the calibration curve and the DCA showed that the XGB model had the best discrimination and clinical practicability. The variables were ranked according to the feature importance, and the five variables contributing the most to the model were hemoglobin (Hgb), white blood cell count (WBC), alkaline phosphatase, coadministration of highly toxic chemotherapy drugs, and albumin. Conclusions: The incidence of grade ≥ 2 CIM was not low in children with WT, which needs attention. The XGB model was developed to predict the risk of grade ≥ 2 CIM in children with WT for the first time. The model has good predictive performance and stability and has the potential to be translated into clinical applications. Based on this modeling and application approach, the extension of CIM prediction models to other pediatric malignancies could be expected.


Introduction
Wilms' tumor (WT) is the most common renal malignancy in children and has the second highest incidence of pediatric primary abdominal malignancies. Although multidisciplinary treatments have advanced, recurrence occurs in approximately 15% of children with WT, and the survival rate after recurrence is only about 50% [1][2][3]. As the surgical resection of pediatric tumors is often difficult, chemotherapy is an indispensable part of the treatment for most WT patients.
However, chemotherapy drugs have many toxicities and side effects. Chemotherapyinduced myelosuppression (CIM) is the most common and severe toxicity of chemotherapy for tumors, typically manifesting as anemia, neutropenia, thrombocytopenia, and/or lymphopenia [4][5][6][7], leading to an increased risk of life-threatening infection, fatigue, and potential bleeding [8,9]. CIM often forces children to interrupt or postpone their chemotherapy course, severely compromising the effectiveness of treatment and even leading to death due to CIM-related complications. Studies have reported that the mortality rate related to grade 4 CIM can reach 4-12% [10]. Therefore, early identification of children at high risk of CIM and timely implementation of corresponding preventive and therapeutic measures can not only improve the effectiveness of tumor treatment, but also significantly reduce the disease burden caused by the related complications [11].
Studies have shown that risk factors for CIM include age, nutritional status, poor liver and kidney function, low baseline white blood cell count (WBC), chemotherapy cycles, etc. [12][13][14][15]. Various mathematical models for predicting CIM or febrile neutropenia (FN) have been proposed [16][17][18] and successfully applied to predict dynamic changes in neutrophil count [19,20]. However, these studies focused on predicting the risk of FN in adult tumors such as breast cancer, small cell lung cancer, and colorectal cancer [14,21,22].
The predictors of CIM in pediatric malignant solid tumors, especially in WT, have not been reported. In addition, most of the pharmacokinetic mathematical models developed in these studies focus on predicting CIM/FN caused by a single drug, making it difficult to extend to pediatric tumors requiring multidrug combination therapy. Moreover, the application of these models requires repeated and frequent monitoring of changes in hematological parameters and drug concentrations, such invasive tests are often unacceptable to children and parents [20,22], and the relatively backward economic and medical levels in developing countries seem to make the implementation of such monitoring strategies more difficult.
Therefore, CIM or FN prediction models reported in the existing studies are difficult to widely apply to predict CIM in children with WT. It is necessary to develop a CIM prediction model for children with WT that is easy to use and has good prediction efficiency.
At present, artificial intelligence (AI) has been widely applied in the medical field. Machine learning (ML), as a branch of AI, can overcome the shortcomings of traditional logistic regression and mathematical models, and has a strong ability for feature recognition, classification, and prediction [23]. The models established based on machine learning have been successfully used in predicting the prognosis of various tumors or diseases, which presented good predictive ability [24][25][26]. Shibahara et al. collected pretreatment clinical data of glioma patients treated with nimustine hydrochloride (ACNU), and further successfully established a prediction model of CIM using machine learning, as well as describing the relationship between myelosuppression and hematopoietic stem cells (HSCs) [27]. In our study, various premedication clinical data in each chemotherapy cycle of WT children with a large sample size from the clinical big data platform of our hospital were collected, including blood cells baseline level, liver and kidney function indicators, tumor stage, body weight, body surface area and other variables, and six ML algorithms were used to construct CIM prediction models. Meanwhile, further evaluation of each model was carried out to select the model with the best prediction performance, which can help doctors identify children with WT at high risk of CIM early and develop individualized strategies for prevention, treatment, and follow-up to reduce the disease burden and improve prognosis.

Patients
The data of patients with WT who received chemotherapy in our hospital from January 2009 to March 2022 were collected from our hospital's clinical big data platform. Inclusion criteria: (1) younger than 18 years old; (2) patients diagnosed with WT; (3) patients having received at least one cycle of chemotherapy; (4) patients having received at least one routine blood test and biochemical blood test before and after chemotherapy. Exclusion criteria: (1) patients with other hematologic diseases or a history of HIV infection or stem cell transplantation; (2) patients with incomplete medical records (missing more than 50% of variables used for analysis); (3) patients with treatment interruption.

General Variables
Variables such as demographic data, clinicopathological characteristics, the laboratory examination, and medication information after each admission were collected as follows: age, gender, height, weight, tumor stage, COG grade, the routine hematologic index and biochemical index, routine urinalysis, the type of chemotherapy drugs used, chemotherapy cycles, etc.

Derived Variables
Coadministration of highly toxic chemotherapy drugs refers to any high hematologic toxicity chemotherapy drugs used during that chemotherapy cycle.

Quality Control of Samples
Each chemotherapy cycle of each WT patient was taken as a separate sample. The missing rate of each sample characteristic variable was counted, and 50% was selected as the threshold value according to the distribution of each sample characteristic variable and modeling requirements. If 50% or more of all characteristic variables were missing simultaneously, the sample characteristic variable was considered seriously missing and met the exclusion criteria.

Imputation Methods of Missing Data
For clinical characteristic variables, after the sample size was determined, the missing rate of each characteristic variable was checked, and 20% was selected as the threshold according to the modeling requirements. If the missing rate of the characteristic variable exceeds 20%, the variable will be deleted and not included in the model construction.
Other missing categorical variables were imputed with the mode while missing continuous variables were imputed with the median. In addition, chemotherapy drugs with a relative frequency of medication less than 5% were also deleted and not included in the model construction (relative frequency of medication = frequency of drug use/total sample size).

Datasets and Algorithms
Extreme gradient boosting (XGB), logistic regression (LR), random forest (RF), least absolute shrinkage and selection operator (LASSO), support vector machine (SVM), and CatBoost were used to establish the ML model. R version 4.2.0 and Python version 3.7 were used for model construction and statistical analysis. Stratification was performed according to the outcome, and the data set was randomly divided into the training set and the test set at a 7:3 ratio.

Original Variables and Variable Selection
Information value (IV) was used as a correlation indicator, which can be used to measure the difference in the distribution of a variable between the two groups of samples to characterize the predictive ability of the variable on the outcome [31]. The threshold value of IV was set as 0.2, and variables with IV less than 0.2 were deleted. Since the chemotherapy cycle and the type of chemotherapy drugs have been confirmed to be related to the occurrence of CIM, these two variables were included in the model even though their IV were less than 0.2.
For the selected variables related to the outcome, the absolute value of the correlation coefficient was calculated to examine the collinearity, and the threshold was set as 0.8. The variable with the smaller IV was also deleted from the collinear variables exceeding the threshold.

Modelling Procedure
Fivefold cross-validation (CV) was used to divide the CV training set and the CV validation set inside the training set, then the optimal hyperparameter of the model was obtained using Bayesian optimization. According to the optimal hyperparameter, the model was trained again on the entire training set to obtain the final model, and further evaluated the models' prediction performance on the training set and test set.
The area under the curve (AUC), sensitivity (TPR), specificity (TNR), precision (ACC), and precision (PPV) of the receiver operating characteristic curve (ROC) were used to characterize the fitting and accuracy of the model. Population stability index (PSI) was used to measure the stability of the model in the training set and validation set [32]. (PSI < 0.1, the model is stable; PSI: 0.1~0.25, the model is slightly unstable; PSI > 0.25, the model is unstable). Hosmer-Lemeshow test was used to assess the calibration of models. The decision curve analysis (DCA) was used to evaluate the clinical utility of these models. Moreover, Coefficients of weight importance in the final model were provided to rank the feature importance.

Clinical Application of the Model
In order to realize the translation of research results into clinical practice, the model was presented and applied in our hospital information system (HIS) in the form of clinical decision support system (CDSS). After the first hematological examination for each patient, the doctor preliminarily confirms the medication regimen, at which point the system backstage automatically extracts the relevant data from the HIS into the model, then calculate the risk value and present it in the CDSS. "Risk Scoring" is one of the essential modules. A patient's risk score was calculated based on the final model score × 100, where low-medium risk was classified according to negative predictive value (NPV) = 0.8 and medium-high risk was classified according to positive predictive value (PPV) = 0.9. That is, the cutoff value for low-medium risk should ensure a negative prediction rate of >80% for low-risk patients, and the cutoff value for medium-high risk should ensure a positive prediction rate of >90% for high-risk patients.
To further improve the intuitiveness, accessibility, and practicability of the model, a brief description and the scoring basis of the model were presented in the CDSS, and the "Historical Trend" module was added to show the occurrence of CIM in previous admissions. In addition, the system can provide recommendations for possible prevention or intervention strategies based on the model scores.

Statistical Analysis Methods
Continuous variables were described in the form of the median (lower and upper quantile), and categorical variables were described in the form of frequency and percentage. Wilcoxon rank sum test and chi-square test were used to compare the differences between groups for continuous variables and categorical variables, respectively. p < 0.05 was considered statistically different.
The entire modeling procedure is shown in Figure 1.

Description of Baseline Characteristics
On our hospital's clinical big data platform, 437 cases of WT patients receiving chemotherapy were retrieved, with a total of 1478 chemotherapy cycles. According to the inclusion and exclusion criteria, 45 samples were excluded, resulting in a final sample size of 1433. According to the National Cancer Institute Common Terminology Criteria for Ad-verse Events (CTCAE) version 5.0, grade ≥ 2 CIM can be defined if one of the following four criteria is met after chemotherapy: (1) WBC < 3.0 × 10 9 /L; (2) absolute neutrophil count (ANC) < 1.5 × 10 9 /L; (3) hemoglobin level (Hgb) < 100 g/L; (4) platelet count (PLT) < 75 × 10 9 /L. The baseline characteristics of all patients and the comparison of baseline characteristics of patients in different datasets are shown in Table 1, and the comparison of baseline characteristics of patients with and without grade ≥ 2 CIM is shown in Table 2.

Selection of Variables during Modeling
Matching the patient's first laboratory examination index after admission, a total of 46 clinically relevant characteristic variables were extracted, of which six characteristic variables (absolute value of basophils, percentage of basophils, cholinesterase, prealbumin, bile acids, and urine pH) had a missing rate of more than 20% and were excluded. Finally, 40 clinical characteristic variables were incorporated into the model for further screening, as shown in Table 3.

Selection of Chemotherapy Drugs
The relative frequency of the use of each chemotherapy drug is shown in Table 4, among which bleomycin, fluorouracil, topotecan, vindesine, and ifosfamide were excluded because the relative frequency of use was less than 5% and significantly different from that of other drugs. Thus, a total of nine variables including cisplatin, doxorubicin, epirubicin, carboplatin, etoposide, actinomycin D, cyclophosphamide, and vincristine, as well as the coadministration of highly toxic chemotherapy drugs, were incorporated into the final model.

Variables Finally Selected for the Model
According to the selection criteria of predictive variables, 19 variables finally incorporated into the model are shown in Table 5. In order to improve the interpretability of the final model (XGB), we ranked the feature importance of the incorporated variables. The five variables contributing the most to the model were hemoglobin (Hgb), white blood cell count (WBC), alkaline phosphatase, coadministration of highly toxic chemotherapy drugs, and albumin, as shown in Figure 2.

Evaluation of the Model
The fitting effect and authenticity evaluation results of each model are shown in Figure 3, Tables 6 and 7, respectively. The results show that the XGB model has the best fitting effect, the largest AUC (training set: 0.981, test set: 0.896), good sensitivity (76.2%), and specificity (93.2%), and better stability. In the XGB model, the feature importance of each variable is shown in Figure 2. The five variables that contribute the most to the model are Hgb, WBC, alkaline phosphatase, coadministration of highly toxic chemotherapy drugs, and albumin. In addition, the XGB model showed the best calibration in the comparison of calibration curves of other models (Figure 4). DCA showed that the XGB model can contribute to clinical decision-making ( Figure 5). The ranking of feature importance in the XGB model. Briefly, the importance weight of a feature is the sum of the number of its occurrences in all decision trees. In other words, the more a feature is used to build a decision tree in the model, the higher its importance weight will be. Hgb: hemoglobin; WBC: white blood cell count; ALP: alkaline phosphatase; RBC: red blood cell count; MCHC: mean corpuscular hemoglobin concentration; PLT: platelet count; RDW: red blood cell distribution width.

Evaluation of the Model
The fitting effect and authenticity evaluation results of each model are shown in Figure 3, Tables 6 and 7, respectively. The results show that the XGB model has the best fitting effect, the largest AUC (training set: 0.981, test set: 0.896), good sensitivity (76.2%), and specificity (93.2%), and better stability. In the XGB model, the feature importance of each variable is shown in Figure 2. The five variables that contribute the most to the model are Hgb, WBC, alkaline phosphatase, coadministration of highly toxic chemotherapy drugs, and albumin. In addition, the XGB model showed the best calibration in the comparison of calibration curves of other models (Figure 4). DCA showed that the XGB model can contribute to clinical decision-making ( Figure 5).

Clinical Application of the Model
Through a series of evaluations of the model, the XGB model with the best predictive efficacy was selected, presented, and applied in our hospital's HIS in the form of CDSS. It includes modules such as the risk scoring and scoring basis of grade ≥ 2 CIM, model description, historical trend of the previous occurrence of CIM, and management recommendations ( Figure 6). The predictive model is currently running smoothly in the HIS. Moreover, to better demonstrate how our model works in reality and to further elaborate on the clinical applicability of the model, we ran the model in our hospital HIS to assess the risk of CIM in a particular child (Supplementary Materials).

Clinical Application of the Model
Through a series of evaluations of the model, the XGB model with the best predictive efficacy was selected, presented, and applied in our hospital's HIS in the form of CDSS. It includes modules such as the risk scoring and scoring basis of grade ≥ 2 CIM, model description, historical trend of the previous occurrence of CIM, and management recommendations ( Figure 6). The predictive model is currently running smoothly in the HIS. Moreover, to better demonstrate how our model works in reality and to further elaborate on the clinical applicability of the model, we ran the model in our hospital HIS to assess the risk of CIM in a particular child (Supplementary Materials).

CIM Is Not Rare during the Treatment of Children with WT
Chemotherapy is one of the important means of treating tumors. Currently, most chemotherapy drugs exert their effects through cytotoxicity. Cells with strong proliferative activity may be more sensitive to chemotherapy drugs, making drugs more likely to Figure 6. The interface of the CIM prediction model in the form of CDSS applied in our hospital HIS. AI Evaluation: the "AI Evaluation" module shows the risk scores of patients with grade ≥ 2 CIM calculated by the model, with the corresponding "protective factors" and "risk factors" listed below. Historical Trend: the "Historical Trends" module records the occurrence of CIM in previous chemotherapy cycles. Model Description: this module provides a detailed description of the applicable conditions and the model results. Management Recommendations: according to the prediction results of the model, the management suggestions automatically output by the system backstage are displayed in this module. References: this module presents some references.

CIM Is Not Rare during the Treatment of Children with WT
Chemotherapy is one of the important means of treating tumors. Currently, most chemotherapy drugs exert their effects through cytotoxicity. Cells with strong proliferative activity may be more sensitive to chemotherapy drugs, making drugs more likely to damage hematopoietic stem cells or blood cell precursors, leading to severe CIM [27,33]. A clinical consensus is that grade ≥ 2 CIM requires close monitoring and even timely intervention. Identifying patients with a high risk of grade ≥ 2 CIM before administration of chemotherapy drugs can guide doctors to timely administer granulocyte colony-stimulating factor (G-CSF) and other drugs to prevent the occurrence of CIM during the process of closely monitoring the changes in blood cells levels, which avoids the interruption of the chemotherapy course and even the occurrence of more serious complications caused by CIM [34,35]. It is also why we choose the occurrence of grade ≥ 2 CIM as the outcome indicator. In this study, grade ≥ 2 CIM occurred in 58.5% (839/1433) chemotherapy cycles. Although Castagnola et al. reported that the incidence of FN in children with central nervous system tumors was 27% [36], the outcome of the study was FN rather than CIM, and the different types of tumors studied may also affect the incidence of FN, so our findings cannot be compared with their study. Other studies have reported that the incidence of FN in solid tumors is 13-21%, while FN in hematologic tumors is about 33% [37][38][39]. Whereas most of the outcome indicators in these studies were FN, and the subjects were adults, which could not be compared with the incidence of CIM in our study. However, this also emphasizes that the incidence of CIM in children with solid tumors is still unknown and more studies are needed to fill in the gaps. In addition, more than half of the chemotherapy cycles in our study presented grade ≥ 2 CIM, which fully demonstrates that CIM is not rare in treating pediatric tumors, especially WT, and the development of early prediction models for CIM in children with solid tumors is indeed necessary.

Contribution of Variables to Model Prediction Results
According to the ranking of IV, 19 variables were finally included in the model. Studies have shown that chemotherapy cycles and regimens can affect the occurrence of CIM, so even if the IV of those relevant variables were less than 0.2, they were still included in our model. Feature importance is an indicator to measure the contribution of each variable to the model's predictive result (Figure 1). In the XGB model, the Hgb level ranked first in the feature importance ranking. This seems to differ from what most studies have reported. More than one study reported that baseline WBC and ANC levels, but not Hgb levels, were the most critical risk factors for CIM or FN [14,40,41]. On the contrary, it has also been reported that a low baseline level of Hgb was associated with CIM in elderly tumor patients [42]. It has been reported that in addition to Hgb, the decrease of alkaline phosphatase, red blood cell count (RBC), and average hemoglobin concentration and the increase of red blood cell distribution width (RDW) can also reflect anemia or hematopoietic abnormalities to some extent [27,33]. Herein, except for RDW, the above five indicators were lower in the CIM group than in the without-CIM group. This may be because most of the children in this study underwent surgery before chemotherapy, and inevitable intraoperative bleeding and the consumption of the tumor on the body led to a lower baseline Hgb or RBC level before chemotherapy. While stimulated by blood loss, the proliferation of bone marrow hematopoietic cells may be more active, thus more likely to be attacked by chemotherapy drugs.
Although Aagaard et al. did not find that low levels of WBC and ANC were associated with the development of bone marrow suppression in their study [43], most studies have shown that low baseline WBC and ANC levels are risk factors for myelosuppression [12][13][14], and our findings are consistent with them: the low baseline level of WBC and ANC in the XGB model strongly predicts CIM. Due to the short cycle life of granulocytes, it is difficult for haemopoietic stem cells or haemopoietic microenvironment damaged by chemotherapy drugs to generate new granulocytes to replace the consumed granulocytes [27,30]. Hence, a low ANC level is often the earliest manifestation of CIM. Lower baseline WBC or ANC levels mean lower granulocyte reserves, meaning CIM is more likely to occur.
In addition, the low baseline level of albumin may be related to the nutritional status of patients, thus affecting the occurrence of CIM, which is also consistent with the result of another study [44].
Moreover, different patients have different chemotherapy regimens [45][46][47], and different chemotherapy regimens incorporate chemotherapy drugs with different degrees of hematological toxicity [48,49], so treating each chemotherapy regimen as a variable is unrealistic. As a result, we added the variable "Coadministration of highly toxic chemotherapy drugs" to investigate the effect of highly toxic chemotherapy drugs on the risk of developing CIM. Although its IV was small, its feature importance ranked fourth in the XGB model. It validates that chemotherapy drugs with high hematotoxicity are indeed more likely to cause CIM. Unexpectedly, the ranking of feature importance of chemotherapy drugs in the model seems to be different from our understanding of hematological toxicity of chemotherapy drugs. Low hematologic toxicity drugs such as cisplatin and vincristine ranked even higher than high hematologic toxicity drugs such as doxorubicin and cyclophosphamide. This may be because drugs such as cisplatin and vincristine are more frequently used in chemotherapy regimens for children with WT and are often used in combination with other highly toxic chemotherapeutic drugs. Thus, the ranking of the feature importance of these variables may differ slightly from our general understanding of CIM risk factors. Nevertheless, the XGB model developed in this study still performed surprisingly well in predicting grade ≥ 2 CIM.

XGB Model Has Good Predictive Performance for Grade ≥ 2 CIM
Since the first mechanism model based on pharmacokinetics and pharmacodynamics was developed, other mathematical models for predicting CIM or investigating the relationship between a chemotherapy drug and changes in blood cell levels have been developed one after another. These mathematical models can simulate hematopoiesis, granulocytopoiesis, myelosuppression, and leukemia cytodynamics. Recently published reviews have provided a comprehensive overview and summary of various models [50,51], and studies have reported associations between the occurrence of CIM and genomic specificity [52][53][54]. Of these models, the maximum AUC of the model predicting FN or CIM occurrence is only 0.83. Notably, after evaluating the fitting effects of several models used in our study, we found that the XGB model had an AUC of up to 0.981 in the training set and 0.896 in the test set, with satisfactory sensitivity and specificity, as well as good stability. The calibration curve and DCA also suggested that the XGB model had good calibration and could promote clinical decision-making. In addition to good predictive performance, the XGB model we developed has other advantages: the modeling variables we selected were from the baseline data of hematological and biochemical tests before chemotherapy, and the information about the proposed chemotherapy regimen. These variables are readily available prior to drug administration. Children do not need to bear the expensive cost such as genomic marker detection, or the burden and pain caused by frequent laboratory tests.

Application of CIM Prediction Model in Clinical Practice
Translating clinical research results to clinical applications has been a significant challenge. The clinical decision support system (CDSS) helps doctors improve and enhance the efficiency of decision-making by providing systematic medical knowledge and indepth analysis of medical records through a human-computer interaction model, thereby improving the quality of medical care [55]. CDSS is a vital bridge to facilitate the translation of clinical research into clinical application.
Considering the application scenarios of the CIM prediction model, we present the final model in the form of CDSS in our hospital HIS. Patients undergo hematological and biochemical tests after admission. The doctor then specifies the current chemotherapy regimen, followed by the system backstage immediately extracting the relevant data, calculating the CIM risk score through the model and outputting it via CDSS. Doctors can make appropriate treatment plans based on the predicted results. Despite the risk score module, the "Management Recommendations" module and the "Historical Trend" module that records the occurrence of CIM in previous chemotherapy cycles can greatly help doctors make better clinical decisions. To better demonstrate how our model works in reality and to further elaborate on the clinical applicability of the model, we ran the model in our hospital HIS to assess the risk of CIM in a particular child. Please refer to the Supplementary Materials ( Figure S1) for sample cases and model results output interface.
By applying this approach, firstly, doctors can identify high-risk patients early and adopt appropriate management plans to improve patients' prognosis. Secondly, the model calculations and results output are carried out automatically by the system backstage, eliminating the inconvenience of other predictive modeling tools requiring manual data input for the corresponding variables. Thirdly, the relevant data of CIM occurrence in each admission will be automatically stored in the system, which will be helpful for other related clinical studies in the future. All of the above fully reflect the practicability, accessibility, and high predictive efficiency of our model in clinical application.

Limitations and Prospects
However, our study also has some limitations. Firstly, the nature of the retrospective study may inevitably introduce some selection bias; secondly, the risk factors related to CIM, such as prealbumin, BMI, bile acid, bilirubin, etc., which have been reported in other studies [40,56], were not included in the model due to a large amount of missing data. This may be because doctors or patients have insufficient awareness of CIM and do not conduct relevant tests. Thirdly, the dynamic changes in blood cells may be able to predict the specific time when CIM occurs and finding this time point will help doctors develop more accurate prevention strategies for CIM. However, these data were also missing in this study. In addition, our sample size needs to be expanded to make more accurate predictions for different grades of CIM. Furthermore, our model has been successfully piloted in HIS with CDSS, and more data needs to be collected prospectively to further verify the model's accuracy. Finally, different types of tumors may affect the occurrence of CIM, but only children with WT were included in this study. Therefore, the models that can be extended to other pediatric malignant solid tumors need further development. To summarize, a prospective clinical study with large samples and regularly collected data needs to be carried out. We are currently conducting animal experiments related to CIM in order to accurately predict the CIM by finding other more readily available indicators. We intend to validate these indicators in prospective clinical studies and incorporate them into the model for continuous calibration and optimization. Despite these limitations, to our knowledge, this study is the first to use ML algorithms to establish a predictive model for CIM in children with WT, achieving better predictive effects than other pharmacokinetic or mathematical models. Based on the construction method and clinical application approach of this ML model, a CIM prediction model that can be extended to other pediatric malignancies and facilitates widespread clinical applications can be expected.

Conclusions
The incidence of grade ≥ 2 CIM was not low in children with WT, which needs more attention. This study developed an ML-based prediction model to predict the risk of grade ≥ 2 CIM in WT children for the first time. The model has good predictive performance and stability and is also convenient for clinical application, which will help doctors identify patients at high risk of CIM earlier, and develop and implement individualized preventive medication strategies, thus reducing the disease burden and economic burden of CIM in children with WT. Based on this modeling and application approach, the extension of CIM prediction models to other pediatric malignancies is expected. Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.

Data Availability Statement:
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.